Multi-level Schema Extraction for Heterogenous Semi-structured Data
نویسندگان
چکیده
منابع مشابه
Automatic integration of Heterogenous XML-schemas
Due to the XML’s flexibility and semi-structured nature, complications arise when trying to transplant data from one XML to another. Researchers have made great strides in solving the problem of integrating homogenous XML. But there are very few specifically addressing the problem of integrating heterogenous documents. We introduce XSD Matcher, a system for automatically mapping a collection of...
متن کاملSemi-Structured Data Extraction and Schema Knowledge Mining
It is well known that World Wide Web has become a huge information resource. Therefore, it is very important for us to utilize this kind of information effectively. This paper proposes a semi-structured data extraction method to get the useful information embedded in a group of relevant web pages, and store it with OEM(Object Exchange Model). Then, we adopt data mining method to discover schema...
متن کاملAn ontology-based approach for resolving semantic schema conflicts in the extraction and integration of query-based information from heterogeneous web data sources
There are many external resources and heterogeneous data on the internet that an organization or user may need to improve the decision making process. It is therefore, very important and critical that this information are complete, precise and can be acquired on time. Most web sources provide data in semi-structured form on the internet. The combination of semi-structured data from different so...
متن کاملOntology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis
The Market Blended Insight project has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the un...
متن کاملModelling the Webspace of an Intranet
Searching the internet using the currently available search engines is not satisfactory. The techniques used there focus on the extraction of relevant information directly from the documents available on the web. We introduce a new approach, which aims at describing the content of a webspace, formed by a collection of related documents, instead of looking at the single documents. By identifying...
متن کامل